Segmenting DNA sequence into `words'
نویسنده
چکیده
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. Then we apply the unsupervised approach to build the DNA vocabulary and design DNA sequence segmentation method. We also find different genomes is likely to use the similar ‘languages’.
منابع مشابه
Segmenting DNA sequence into 'words' based on statistical language model
[Abstract] This paper presents a novel method to segment/decode DNA sequences based on n-gram statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. The bound of language entropy of DNA sequence is about 1.5674 bits. After building an n-gram biology languages model, we design an unsupervised ‘probability approach...
متن کاملPhoneme Segmenting Alignment with the Common Core Foundational Skills
In 2006, the easyCBM reading assessment system was developed to support the progress monitoring of phoneme segmenting, letter names and sounds recognition, word reading, passage reading fluency, and comprehension skill development in elementary schools. More recently, the Common Core Standards in English Language Arts have been introduced as a framework for outlining grade-level achievement exp...
متن کاملSegmenting Narrative Text into Coherent Scenes
This paper describes a quantitative indicator for segmenting narrative text into coherent scenes. The indicator, called the lexical cohesion pro le (LCP), records lexical cohesiveness of words in a xed-length window moving word by word on the text. The cohesiveness of words, which represents their coherence, is computed by spreading activation on a semantic network. The basic idea of LCP is: (1...
متن کاملThe Comparison of different Procedures for DNA extraction from paraffin-embedded Tissues: A commercial kit and a traditional method based on heating
Abstract Background and objectives: Paraffin-embedded tissues and clinical samples are a valuable resource for molecular genetic studies, but the extraction of high-quality genomic DNA from this tissues is still a problematic issue. In the Present study, the efficiency of two DNA extraction protocols, a commercial kit and a traditional method based on heating and K Proteinase was compared. Mate...
متن کاملMOSAIC: segmenting multiple aligned DNA sequences
UNLABELLED MOSAIC is a set of tools for the segmentation of multiple aligned DNA sequences into homogeneous zones. The segmentation is based on the distribution of mutational events along the alignment. As an example, the analysis of one repeated sequence belonging to the subtelomeric regions of the yeast genome is presented. AVAILABILITY Free access from ftp://ftp.biomath.jussieu.fr/pub/pape...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012